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Abstract. We can recover approximately a sparse signal with limited noise, i.e, a vector of length 
d with at least d — m zeros or near- zeros, using little more than m log(d) nonadaptive linear mea- 
surements rather than the d measurements needed to recover an arbitrary signal of length d. Several 
research communities are interested in techniques for measuring and recovering such signals and a 
variety of approaches have been proposed. We focus on two important properties of such algorithms. 

• Uniformity. A single measurement matrix should work simultaneously for all signals. 

• Computational Efficiency. The time to recover such an m-sparse signal should be close to 
the obvious lower bound, mlog(d/m). 

To date, algorithms for signal recovery that provide a uniform measurement matrix with approxi- 
mately the optimal number of measurements, such as first proposed by Donoho and his collabora- 
tors, and, separately, by Candes and Tao, are based on linear programming and require time poly(d) 
instead of m polylog(d). On the other hand, fast decoding algorithms to date from the Theoretical 
Computer Science and Database communities fail with probability at least l/poly(d), whereas we 
need failure probability no more than around l/d" 1 to achieve a uniform failure guarantee. 

This paper develops a new method for recovering m-sparse signals that is simultaneously uniform 
and quick. We present a reconstruction algorithm whose run time, 0(m log 2 (m) log 2 (d)), is sublinear 
in the length d of the signal. The reconstruction error is within a logarithmic factor (in m) of the 
optimal m-term approximation error in i\. In particular, the algorithm recovers m-sparse signals 
perfectly and noisy signals are recovered with polylogarithmic distortion. Our algorithm makes 
0(m log 2 (d)) measurements, which is within a logarithmic factor of optimal. We also present a 
small-space implementation of the algorithm. 

These sketching techniques and the corresponding reconstruction algorithms provide an algo- 
rithmic dimension reduction in the l\ norm. In particular, vectors of support m in dimension d 
can be linearly embedded into 0(m log 2 d) dimensions with polylogarithmic distortion. We can 
reconstruct a vector from its low-dimensional sketch in time 0(m log 2 (m) log 2 (d)). Furthermore, 
this reconstruction is stable and robust under small perturbations. 



1. Introduction 

We say that a metric space (X, dx) embeds into a metric space (Y, dy) with distortion D if there 
are positive numbers A, B such that B/A < D and a map : X — > Y such that 

A d x (x,y) < <iy(*(x),*(y)) < B d x (x,y) for all x, y G X. (1.1) 
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A fundamental problem is to understand when a finite metric space, which is isometrically em- 
bedded in some normed space X, admits a dimension reduction; i.e., when we can embed it in 
an appropriate normed space Y of low dimension. Dimension reduction techniques enjoy a wide 
variety of algorithmic applications, including data stream computations |CM03l lGGI + 02a] and ap- 
proximate searching for nearest neighbors |IN05j (to cite just a few). The dimension reduction 
result of Johnson and Lindenstrauss |JL84j is a fundamental one. It states that any set of iV points 
in £2 can be embedded in £% with distortion (1 + e) and where the dimension n = 0(log(iV)/e 2 ). 

A similar problem in the £\ space had been a longstanding open problem; Brinkman and 
Charikar |B(J03| solved it in the negative (see another example in |JNL04j ). There exists a set 
of N points in l\ such that any embedding of it into with distortion D requires n = N^ 1 ^ 2 ^ 
dimensions. Thus, a dimension reduction in £\ norm with constant distortion is not possible. How- 
ever, it is well known how to do such a dimension reduction with a logarithmic distortion. One 
first embeds any iV-point metric space into £2 with distortion O(logiV) using Bourgain's theorem 
|Bou85j , then does dimension reduction in £2 using Johnson-Lindenstrauss result |.TL84j , and finally 
embeds £2 into ^ 2n with constant distortion using Kashin's theorem ( |Kas77j . see Corollary 2.4 in 
|Pis89j ). For linear embeddings even distortions of polylogarithmic order are not achievable. 
Indeed, Charikar and Sahai |CS02| give an example for which any linear embedding into £\ incurs 
a distortion Q(yiV/n). 

Two fundamental questions arise from the previous discussion. 

(1) What are spaces for which a dimension reduction in the £\ norm is possible with constant 
distortion? 

(2) What are spaces for which a linear dimension reduction in the £\ norm is possible with 
constant or polylogarithmic distortion? 

One important space which addresses question (2) positively consists of all vectors of small 
support. Charikar and Sahai |CS021 prove that the space of vectors of support m in dimension 
d can be linearly embedded into Py with distortion 1 + e with respect to the £\ norm, where 
n = 0((m/e) 2 log d) (Lemma 1 in [CS02j l. They do not, however, give a reconstruction algorithm 
for such signals and their particular embedding does not lend itself to an efficient algorithm. 

The main result of our paper in an algorithmic linear dimension reduction for the space of vectors 
of small support. The algorithm runs in sublinear time and is stable. 

Theorem 1. Let Y be a set of points in M, d endowed with the £\ norm. Assume that each point 
has non-zero coordinates in at most m dimensions. Then these points can be linearly embedded 
into £\ with distortion 0(log 2 (d) log 3 (m)), using only 0(m log 2 d) dimensions. Moreover, we can 
reconstruct a point from its low- dimensional sketch in time 0(m log 2 (m) log 2 (d)). 

This dimension reduction reduces the quadratic order of m in |CS02j to a linear order. Our 
embedding does, however, incur a distortion of polylogarithmic order. In return for this polyloga- 
rithmic distortion, we gain an algorithmic linear dimension reduction — there exists a sublinear time 
algorithm that can reconstruct every vector of small support from its low-dimensional sketch. 

The space of vectors of support m in dimension d is a natural and important space as it models 
closely the space of compressible signals. A compressible signal is a long signal that can be rep- 
resented with an amount of information that is small relative to the length of the signal. Many 
classes of <i-dimensional signals are compressible, e.g., 

• The m-sparse class Bo(m) consists of signals with at most m nonzero entries. 

• For < p < 1, the weak £ p class B wes ^. p (r) contains each signal / whose entries, sorted by 
decaying magnitude, satisfy |/|^ < ri~ x l p . 

These types of signals are pervasive in applications. Natural images are highly compressible, as are 
audio and speech signals. Image, music, and speech compression algorithms and coders are vital 
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pieces of software in many technologies, from desktop computers to MP3 players. Many types of 
automatically-generated signals are also highly redundant. For example, the distribution of bytes 
per source IP address in a network trace is compressible — just a few source IP addresses send the 
majority of the traffic. 

One important algorithmic application of our dimension reduction is the reconstruction of com- 
pressible signals. This paper describes a method for constructing a random linear operator $ that 
maps each signal / of length d to a sketch of size 0(m log 2 d). We exhibit an algorithm called 
Chaining Pursuit that, given this sketch and the matrix constructs an m-term approximation 
of the signal with an error that is within a logarithmic factor (in m) of the optimal m-term approx- 
imation error. A compressible signal is well-approximated by an m-sparse signal so the output of 
Chaining Pursuit is a good approximation to the original signal, in addition to being a compressed 
represention of the original signal. Moreover, this measurement operator succeeds simultaneously 
for all signals with high probability. In manyof the above application settings, we have resource- 
poor encoders which can compute a few random dot products with the signal but cannot store 
the entire signal nor take many measurements of the signal. The major innovation of this result is 
to combine sublinear reconstruction time with stable and robust linear dimension reduction of all 
compressible signals. 

Let f m denote the best m-term representation for /; i.e., f m consists of / restricted to the m 
positions that have largest-magnitude coefficients. 

Theorem 2. With probability at least (1 — 0(d~ 3 )), the random measurement operator has the 
following property. Suppose that f is a d-dimensional signal whose best m-term approximation with 
respect to l\ norm is f m . Given the sketch V = of size 0(mlog 2 (d)) and the measurement 
matrix 3>, the Chaining Pursuit algorithm produces a signal f with at most m nonzero entries. The 
output f satisfies 

||/-/||i<<7(l + logm)||/-/ m || r (1.2) 

In particular, if f m = f, then also f = f ■ The time cost of the algorithm is 0(?n log 2 (?n) log 2 ((i)). 

Corollary 3. The factor logm is intrinsic to this approach. However, the proof gives a stronger 
statement — the approximation in the weak-1 norm without that factor: \\f — /|| W cak-i < C||/ — /m||i- 
This follows directly from the definition of the weak norm and our proof, below. 

Corollary 4. Our argument shows that the reconstruction f is not only stable with respect to noise 
in the signal, as Equation (jl.2j) shows, but also with respect to inaccuracy in the measurements. 
Indeed, a stronger inequality holds. For every V (not necessarily the sketch 3>/ of f) if f is the 
reconstruction from V (not necessarily from Qf), we have 

||/m " /111 < C(l + logm)(||/ - f m \\ x + ||*/ - F||i) . 

1.1. Related Work. The problem of sketching and reconstructing m-sparse and compressible 
signals has several precedents in the Theoretical Computer Science literature, especially the paper 
|CM03j on detecting heavy hitters in nonnegative data streams and the works |GGI + 02"bl ICM S05 
on Fourier sampling. More recent papers from Theoretical Computer Science include |CM051 
ICRTV05] . Sparked by the papers |Don04j and |CT04j . the computational harmonic analysis and 
geometric functional analysis communities have produced an enormous amount of work, including 
j( :KT04I lDon05l IDT051 ICT05l IRVQ51 ITC051 IMPTJH5] . 

Most of the previous work has focused on a reconstruction algorithm that involves linear pro- 
gramming (as first investigated and promoted by Donoho and his collaborators) or second-order 
cone programming |Don04| ICT04| ICRTV05| . The authors of these papers do not report computa- 
tion times, but they are expected to be cubic in the length d of the signal. This cost is high, since 
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we are seeking an approximation that involves 0(m) terms. The paper |TG05j describes another 
algorithm with running time of order 0(m 2 d log d), which can be reduced to 0(md log d) in certain 
circumstances. None of these approaches is comparable with the sublinear algorithms described 
here. 

There are a few sublinear algorithms available in the literature. The Fourier sampling paper 
|CmS05| can be viewed as a small space, sublinear algorithm for signal reconstruction. Its primary 
shortcoming is that the measurements are not uniformly good for the entire signal class. The recent 
work |CM05| proposes some other sublinear algorithms for reconstructing compressible signals. Few 
of these algorithms offer a uniform guarantee. The ones that do require more measurements — 
0(m 2 logd) or worse — which means that they are not sketching the signal as efficiently as possible. 

Table ^ compares the major algorithmic contributions. Some additional comments on this table 
may help clarify the situation. If the signal is / and the output is /, let E = E{f) = f — f denote 
the error vector of the output and let E opt = E opt (f) = f — f m denote the error vector for the 
optimal output. Also, let C opt = C op t(/) denote max 9 E opt (g), where g is the worst possible signal 
in the class where / lives. 

1.2. Organization. In Section |2 we provide an overview of determining a sketch of the signal /. 
In Section EH we give an explicit construction of a distribution from which the random linear map $ 
is drawn. In Section 0J we detail the reconstruction algorithm, Chaining Pursuit, and in Section [5] 
we give an analysis of the algorithm, proving our main result. In Sectional we use our algorithmic 
analysis to derive a dimension reduction in the l\ norm for sparse vectors. 

2. Sketching the Signal 

This section describes a linear process for determining a sketch V of a signal /. Linearity 
is essential for supporting additive updates to the signal. Not only is this property important 
for applications, but it arises during the iterative algorithm for reconstructing the signal from 
the sketch. Linearity also makes the computation of the sketch straightforward, which may be 
important for modern applications that involve novel measurement technologies. 

2.1. Overview of sketching process. We will construct our measurement matrix by combining 
simple matrices and ensembles of matrices. Specifically, we will be interested in restricting a signal 
/ to a subset A of its d positions and then restricting to a smaller subset B C A, and it will be 
convenient to analyze separately the two stages of restriction. If P and Q are 0-1 matrices, then 
each row Pj of P and each row Qj of Q restricts / to a subset by multiplicative action, Pif and Qjf, 
and sequential restrictions are given by PiQjf = QjPif- We use the following notation, similar 
to |HMn5j. 

Definition 5. Let P be a p-by-d matrix and Q a q-by-d matrix, with rows {Pi : < i < p} and 
{Qj : < j < q}, respectively. The row tensor product S = P® r Q of P and Q is a pq-by-d matrix 
whose rows are {P-iQj : < i < p, < j < q}, where PiQj denotes the componentwise product of 
two vectors of length d. 

The order of the rows in P(& r Q will not be important in this paper. We will sometimes index 
the rows by the pair (i,j), where i indexes P and j indexes Q, so that P® r Q applied to a vector 
x yields a q x p matrix. 

Formally, the measurement operator <1> is a row tensor product $ = B® r A. Here, A is a 
0{m log d) x d matrix called the isolation matrix and B is a O(logd) x d matrix called the bit test 
matrix. The measurement operator applied to a signal / produces a sketch V = <&/, which we can 
regard as a matrix with dimensions 0{m log d) x 0(log<i). Each row of V as a matrix contains the 
result of the bit tests applied to a restriction of the signal / by a row of A. We will refer to each 
row of the data matrix as a measurement of the signal. 
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Approach, References 


Signal Class 


Uniform 


Error bd. 


# Measurements 


Storage 


Decode time 


OMP + Gauss |TG05| 


m-sparse 


No 


No error 


to log ci 


Tnd log d 


m 2 d\ogd 


GrouD testing |CM06| 


m-sparse 


No 


No error 


m log 2 d 


logd 


to log 2 d 


l\ min. + Gauss 


m-sparse 


Yes 


No error 


No closed form 


£l(md) 


LP(TOd) 


|l3on06limT)5lll3T06| 














GrouD testing |GM06| 


m-sparse 


Yes 


No error 


to 2 log 2 d 


m\og{d/m) 


to 2 log 2 d 


GrouD testing |GM06| 


weak £ p 


Yes 


^ 2 - C C »P' p 


3 — p Q 

m 1 -p log d 


2-p 

m 1 -" logd 


4-2p o 

to 1 -p log d 


t\ min. + Gauss 


Arbitrary 


Yes 


||£|| 2 < m-^HSopt^ 


m\og(d/m) 


d\og(d/m) 


LP (md) 


fnT04ll(jbl306| 














l\ min. + Fourier 


Arbitrary 


Yes 


\\E\\ 2 < m-^HSopt^ 


mlog d 


to log 5 d 


dlogd (empirical) 


(CT04I IRV06I IGDD06| 














Chaining Pursuit 


Arbitrary 


Yes 


\\ E wcak-1 - E °P' 1 


to log 2 d 


d\og 2 d 


to log 2 d 


|(;sTvnfi| 






\\E ^ < log(m)||-E pt|| 1 








Fourier sampling 


Arbitrary 


No 


\\E\\ 2 < \\E opt 2 


to polylog d 


to polylog d 


to polylog d 


|GGI+02bllGMS05| 














GrouD testing |CM06| 


Arbitrary 


No 


\\E\\ 2 < \\E opt 2 


to log 5/2 d 


\og 2 d 


to log 5/2 d 



es: Above, LP (md) denotes resources needed to solve a linear program with Q(md) variables, plus minor overhead. We suppress big 

notation for legibility. 

Table 1 . Comparison of algorithmic results for compressed sensing 



2.2. The isolation matrix. The isolation matrix A is a 0-1 matrix with dimensions 0(m log d)xd 
and a hierarchical structure. Let a be a sufficiently large constant, to be discussed in the next two 
sections. The Chaining Pursuit algorithm makes K = 1 + log a m passes (or "rounds") over the 
signal, and it requires a different set of measurements for each pass. The measurements for the /cth 
pass are contained in the 0(mk log(d)/2 fc ) x d submatrix A^ k \ During the kth pass, the algorithm 

performs = 0(klogd) trials. Each trial t is associated with a further submatrix A[ k \ which has 
dimensions 0(m/2 k ) x d. 



In summary, 



A« 



A( 2 ) 



A {K) 



where 



A (k) 



i(*0 



Each trial submatrix A^ encodes a random partition of the d signal positions into 0(m/2 k ) 
subsets. That is, each signal position is assigned uniformly at random to one of 0{m/2 k ) subsets. 
So the matrix contains a 1 in the position if the jth component of the signal is assigned to 

(k) 

subset i. Therefore, the submatrix A\ is a 0-1 matrix in which each column has exactly one 1, 
e.g., 

"0100110 
1 1 0. 
1 1 

The trial submatrix can also be viewed as a random linear hash function from a space of d keys 
onto a set of 0(m/2 k ) buckets. 

2.3. The bit test matrix. Formally, the matrix B consists of a row e of l's and other rows given 
by a 0-1 matrix Bq, which we now describe. The matrix Bq has dimensions log 2 |~cf| x d. The ith 
column of Bq is the binary expansion of i. Therefore, the componentwise product of the ith row of 
Bq with / yields a copy of the signal / with the components that have bit i equal to one selected 
and the others zeroed out. 

An example of a bit test matrix with d = 8 is 

11111111 



B 









2.4. Storage costs. The bit test matrix requires no storage. The total storage for the isolation 
matrix is 0(dlog<i). The space required for the isolation matrix is large, but this space can 
conceivably be shared among several instances of the problem. In Section EH we give an alternate 
construction in which a pseudorandom isolation matrix is regenerated as needed from a seed of 
size m\og 2 {d); in that construction only the seed needs to be stored, so the total storage cost is 
m log 2 (d). 

2.5. Encoding time. The time cost for measuring a signal is 0(log 2 (m) log 2 (d)) per nonzero 
component. This claim follows by observing that a single column of A contains 0(log 2 (m) log(ci)) 
nonzero entries, and we must apply A to each of O(logd) restrictions of the signal — one for each 
row of B. Note that this argument assumes random access to the columns of the isolation matrix. 
We will use this encoding time calculation when we determine the time costs of the Chaining 
Pursuit algorithm. In Section |SJ we give an alternative construction for A that reduces the storage 
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requirements at the cost of slightly increased time requirements. Nevertheless, in that construction, 
any m columns of A can be computed in time to ^ 1 ) each, where o(l) denotes a quantity that tends 
to as both m and d get large. This gives measurement time per nonzero component. 

3. Small Space Construction 

We now discuss a small space construction of the isolation matrix, A. The goal is to specify 
a pseudorandom matrix A from a small random seed, to avoid the Q(dlogd) cost of storing A 
explicitly. We then construct entries of A as needed, from the seed. If we were to use a standard 
pseudorandom number generator without further thought, however, the time to construct an entry 
of A might be Q(m), compared with O(l) for a matrix that is fully random and explicitly stored. 
We will give a construction that addresses both of these concerns. 

As discussed in Section 12.21 the matrix A consists of polylog(d) submatrices that are random 
partitions of the d signal positions into 0{mk) subsets. In this section, we give a new construction 
for each submatrix; the submatrices fit together to form A in the same way as in Section [2.21 We 
will see from the analysis in Section [51 the partition map of each random submatrix need only be 
mfc-wise independent; full independence is not needed as we need only control the allocation of 
spikes into measurements in each submatrix. It follows that we need only construct a family of d 
random variables that are m^-wise independent and take values in {0, . . . , r — 1} for any given r < d. 
Our goal is to reduce the storage cost from 0(d log d) to mpolylog(cf) without unduly increasing 
the computation time. It will require time Q(m) to compute the value of any single entry in the 
matrix, but we will be able to compute any submatrix of m columns (which is all zeros except for 
one 1 per column) in total time mpolylog(d). That is, the values of any m random variables can be 
computed in time mpolylog(d). As in Theorem 1221 below, our construction will be allowed to fail 
with probability 1/d 3 , which will be the case. (Note that success probability 1 — e _cml °g« \ s no t 
required.) Our construction combines several known constructions from |AHU83( ICLBSOlj . For 
completeness, we sketch details. 

3.1. Requirements. To ease notation, we consider only the case of uik = m. Our goal is to 
construct a function f s : {0, . . . , d— 1} — ► {0, . . . , r — 1}, where s is a random seed. The construction 
should "succeed" with probability at least 1 — l/d 3 ; the remaining requirements only need to hold 
if the construction succeeds. The function should be uniform and m-wise independent, meaning, 
for any m distinct positions < i\, . . . , i m < d and any m targets t±, . . . ,t m , we have 

F s (yjf s (i j )=t j )=r~ m , 

though the distribution on m + 1 random variables may otherwise be arbitrary. Finally, given any 
list A of m positions, we need to be able to compute {/(j) : j € A} in time mpolylog(d). 

3.2. Construction. Let s = (sq,si, . . . ,sk) be a sequence of K < O(logd) independent, identi- 
cally distributed random bits. Let p be a prime with p >2r and d < p < poly(d). Define the map 
g^ : Z p — > Z p which uses the kth element s& from the seed s and maps j € Z p uniformly at random 
to a point g^{j) € 7L V . The map g k s is a random polynomial of degree m — 1 over the field with p 
elements. If 

< g°(j) < r\p/r\, 

where r[p/r\ represents the largest multiple of r that is at most p, then define 

m = l9°s(j)r/ P \ = h(g° s (j)). 
The function h : {0, . . . ,r[p/r\ — 1} — > {0, ... ,r — 1} is a function that is exactly [p/rj-to-l. If 
9s (j) > r liV r J> then we map j to Z p by g\{j), that is independent of g® and identically distributed. 
We repeat the process until j gets mapped to {0, . . . ,r\p/r\ — 1} or we exhaust K < 0(\ogd) 
repetitions. For computational reasons, for each k, we will compute g k at once on all values in a list 
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A of m values and we will write (A) for the list {g$ G ^4}. Figure^gives a formal algorithm. 



Figure 1. Top-level description of m-wise independent random variables. 

Algorithm: Hashing 

Parameters: m,r,d,K 

Input: List A of m values in Z^; pseudorandom seed s. 

Output: List B of m values in {0, . . . ,r — 1} , representing {f s (j):j£A). 

Compute g* (A) for k = 0, 1, 2, . . . , K — 1 . 

If for some j G A, for all k < K , we have g^ (j) > r[p/r\ , then FAIL 
For j G A 

kj = min{&; : g^{j) < r[p/r\} 

fsti) = 9s J (j)- 



3.3. Correctness. 



Lemma 6. Our construction of f s : {0, . . . , d - 

independent partition with probability at least 1 



1} " 

d~ 3 . 



{0, . . . , r — 1} produces a uniform m-wise 



Proof. The proof of the correctness of our construction is a standard argument, which we sketch for 
completeness. First, there is a prime p with d < p < poly ((f). Next, let us consider the construction 
of Because the definition of g^ is independent of k, we drop the s and k and write g for 
simplicity. Because g is a random polynomial of degree m — 1 over the field of p elements, we can 
view the construction of g as the multiplication of a vector c of length m (the coefficients of g) by 
the Vandermonde matrix 

/l 1 1 1 ••• 1 \ 

1 2 3 ••• p- 1 



V 



1 2 2 3 2 • • • (p - l) 2 



V : / 

Thus we obtain = (g(0), g(l), . . . ,g(p — 1)). If A is a list of m positions, then g(A) is cVa, 
where Va is the sub matrix of V gotten by selecting columns according to A. Since V is a square 
vandermonde matrix over a field, it is invertible. It follows that, as c varies, cV a varies over all of 
Z™, hitting each element exactly once. 

Next, g(j) > r[p/r\ with probability at most r/p < 1/2. It follows that, for some k < K, we 
have, with probability at least 1 — 2~ K , that g(j) < r[p/r\. For sufficiently large K < O(logd), 
the probability is at least 1 — 1/d 4 . Taking a union bound over all d possible j's, the construction 
succeeds with probability at least 1 — 1/d 3 . 

It is easy to check that, by construction, f s {A) is uniform on {0, . . . , r — l} m conditioned on the 
construction succeeding. □ 



3.4. Efficiency. 

Lemma 7. Given an arbitrary set A of m positions in Z p 
can evaluate g on A in time (9(mpoly log(cf)). 



and a degree m — 1 polynomial g, we 



Proof. Evaluating g on the set A is known as the multipoint polynomial evaluation (MPE) problem. 
We recall that the MPE problem can be reduced to polylog(m) polynomial multiplications |AHU83j 
and that we can multiply polynomials efficiently using the FFT algorithm. We observe that the 
time to multiply polynomials in mpolylog(d) as we may multiply polynomials by convolving their 
coefficients (via the FFT algorithm) over C and then quantizing and reducing modulo p the result. 
We note that arithmetic modulo p take time at most polylog(d). 

Let us now review the MPE problem. Recall that we wish to evaluate g on the set A, g(A). 
The evaluation of g{x) at some point x = t is equivalent to finding g mod (x — t), since we can 
write g(x) = q{x){x — t) + r by the division theorem. To compute the quotients g mod {x — a 
for each a € A, let us assume that \A\ is a power of 2 (padding if necessary), then we order 
A = {aj} arbitrarily and form a binary tree in which the A:'th node at depth j corresponds to 
the subset Ajk = {o^ : fcm/2- ? < i < (k + l)m/2 J } C A. Once we have formed the binary tree, 
we compute the polynomials pj t k{x) = Y\i & A jk { x ~~ °«) a * eac h node. We also define gj^ at each 
node by gj t k = 9 modp^fc. Our goal is to compute g\ Z m t k = 9 mod p\ gmt k for all k, i.e., reduce 
g modulo each polynomial in a leaf of the tree. To do this, we start with g = go,o = 9 mod po,o, 

1. e., g mod the root polynomial. From gjf. we form the two children, gj + i^k = 9j,k mod pj+i^fc 
and gj + i t 2k+i — 9j,k mod Pj+i,2k+i- Note that, at depth j, we have 2 J polynomials gj^ of degree 
m/2 J — 1 and pj^ of degree m/2 J . 

We form the tree of Pj t kS in a straightforward fashion, using the FFT algorithm to multiply 
polynomials. Multiplying a pair of polynomials at depth j takes time m/2 J polylog(d) and there 
are 0{2 ] ) such problems, for total time mpolylog(d) at depth j, and total time mpolylog(fi) in 
aggregrate over all O(logm) levels. 

It remains to show how to reduce a polynomial g of degree 2n — 1 by a polynomial q of degree n 
in time npolylog(d). First, we reduce x n ~ 1+2 for all k = 0,1,2, ... ,lg(n). Suppose we have done 
the reduction for for x 11 , x n+1 , x n+3 , x n+7 . . . , x n ~ 1+2 . We claim that we can then reduce any 
polynomial h of degree n — 1 + 2 k by q in time npolylog(d). To see this, write 

h(x) =x n - 1+2 "~ 1 ti(x) + ti'(x), 

where h' has degree 2 k ~ 1 and h" has degree n — 1 + 2 k ~ 1 . Then, multiply x n+2k 1-1 mod q by h' 
and obtain a polynomial h of degree n — 1 + 2 fe_1 which we add to h" . This reduces the problem 
for a polynomial of degree n — 1 + 2 k to a polynomial of degree n — 1 + 2 k ~ 1 , in time repolylog(<i). 
Let us perform this reduction k < lg(n) times so that we have a polynomial h of degree n. Once 
we obtain h, we can reduce this polynomial directly, by writing h{x) = x n h! + h"(x), where h! is 
constant and h" has degree n — 1, and then adding h! times x n mod q to h" . 

The above discussion for a polynomial h of degree n — 1 + 2 k holds in particular if h(x) = 

2 . n-i+2 fc . -j. f ij ows by induction that we can reduce x n+2k ~ 1 for all k = 0, 1, 2, ... , lg(n) — 1 in time 
npolylog(d). Finally, we apply the above again to reduce our arbitrary polynomial g. □ 

We note that we can find a suitable prime p in time poly(d) by testing all numbers from d to 
poly(<i). This is a preprocessing step and the time does not count against the claimed measure- 
ment time of dpolylog(d) or claimed decoding time of mpolylog(d) our algorithm. (In fact, time 
polylog(<i) suffices to find a prime.) We omit details. 

From the preceding lemmas, we conclude: 

Theorem 8. There is an implementation of the Chaining Pursuit algorithm that runs in time 
mpolylog(d) and requires total storage mlog(d) numbers bounded by poly(d) (i.e., O(logd) bits). 
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Figure 2. Chaining Pursuit algorithm 
Algorithm: Chaining Pursuit 

Inputs: Number m of spikes, the sketch V, the isolation matrix A 
Output: A list of m spike locations and values 

For each pass k = 0, 1, . . . ,log a m: 

For each trial t = 1, 2, . . . , 0(k log d) : 

For each measurement n = 1, . . . , 0(m/2 k ) 

Use bit tests to identify the spike position 
Use a bit test to estimate the spike magnitude 
Retain distinct spikes with values largest in magnitude 
Retain spike positions that appear in more than 9/10 of trials 
Estimate final spike sizes using medians 
Encode the spikes using the measurement operator 
Subtract the encoded spikes from the sketch 
Return the signal consisting of the m largest retained spikes. 



4. Signal Approximation with Chaining Pursuit 

Suppose that the original signal / is well-approximated by a signal with m nonzero entries 
(spikes). The goal of the Chaining Pursuit algorithm is to use a sketch of the signal to obtain 
a signal approximation with no more than m spikes. To do this, the algorithm first finds an 
intermediate approximation g with possibly more than m spikes, then returns g m , the restriction 
of g to the m positions that maximize the coefficient magnitudes of g. We call the final step of 
the algorithm the pruning step. The algorithm without the pruning step will be called Chaining 
Pursuit Proper, we focus on that until Section f5. 41 

The Chaining Pursuit Proper algorithm proceeds in passes. In each pass, the algorithm recovers 
a constant fraction of the remaining spikes. Then it sketches the recovered spikes and updates 
the data matrix to reflect the residual signal — the difference between the given signal and the 
superposition of the recovered spikes. After O(logm) passes, the residual has no significant entries 
remaining. 

The reason for the name "Chaining Pursuit" is that this process decomposes the signal into 
pieces with supports of geometrically decreasing sizes. It resembles an approach in analysis and 
probability, also called chaining, that is used to control the size of a function by decomposing it 
into pieces with geometrically decreasing sizes. A famous example of chaining in probability is to 
establish bounds on the expected supremum of an empirical process |Talf)5| . For an example of 
chaining in Theoretical Computer Science, see |IN05| . 

4.1. Overview of Algorithm. The structure of the Chaining algorithm is similar to other sub- 
linear approximation methods described in the literature |GGI + 02"b] . First, the algorithm identifies 
spike locations and estimates the spike magnitudes. Then it encodes these spikes and subtracts 
them from the sketch to obtain an implicit sketch of the residual signal. These steps are repeated 
until the number of spikes is reduced to zero. The number a that appears in the statement of the 
algorithm is a sufficiently large constant that will be discussed further in Section and the quantity 
m& is m/a k . Pseudocode is given in Figure 121 
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4.2. Implementation. Most of the steps in this algorithm are straightforward to implement using 
standard abstract data structures. The only point that requires comment is the application of bit 
tests to identify spike positions and values. 

Recall that a measurement is a row of the sketch matrix, which consists of log 2 [d] + 1 numbers: 

[6(0) 6(1) ... 6(log 2 Trf] - 1) | c ] . 

The number c arises from the top row of the bit test matrix. We obtain an (estimated) spike 
location from these numbers as follows. If |6(i)| > \c — b(i)\, then the ith bit of the location is 
zero. Otherwise, the ith bit of the location is one. To estimate the value of the spike from the 
measurements, we use c. 

Recall that each measurement arises by applying the bit test matrix to a copy of the signal 
restricted to a subset of its components. It is immediate that the estimated location and value are 
accurate if the subset contains a single large component of the signal and the other components 
have smaller l\ norm. 

We encode the recovered spikes by accessing the columns of the isolation matrix corresponding 
to the locations of these spikes and then performing a sparse matrix-vector multiplication. Note 
that this step requires random access to the isolation matrix. 

4.3. Storage costs. The primary storage cost derives from the isolation matrix A. Otherwise, 
the algorithm requires only 0(m log d) working space. 

4.4. Time costs. During pass k, the primary cost of the algorithm occurs when we encode the 
recovered spikes. The number of recovered spikes is at most 0(m/a k ), so the cost of encoding these 
spikes is 0(ma~ k log 2 (m) log 2 (d)). The cost of updating the sketch is the same. Summing over all 
passes, we obtain (9(mlog 2 (m) log 2 (d)) total running time. 

5. Analysis of Chaining Pursuit 

This section contains a detailed analysis of the Chaining Pursuit Proper algorithm (i.e., Chaining 
Pursuit without the final pruning step), which yields the following theorem. Fix an isolation matrix 
A which satisfies the conclusions of Condition |30] in the sequel and let $ = A® r B, where B is a 
bit test matrix. 

Theorem 9 (Chaining Pursuit Proper). Suppose that f is a d- dimensional signal whose best m- 
term approximation with respect to t\ norm is f m . Given the sketch V = <&/ and the matrix 
<&, Chaining Pursuit Proper produces a signal f with at most 0(m) nonzero entries. This signal 
estimate satisfies 

\\f-f\\i < (l + Clogm)||/-/ m || r 
In particular, if f m = f, then also f = f . 

5.1. Overview of the analysis. Chaining Pursuit Proper is an iterative algorithm. Intuitively, 
at some iteration k, we have a signal that consists of a limited number of spikes (positions whose 
coefficient is large) and noise (the remainder of the signal) . We regard the application of the isolation 
matrix A as repeated trials of partitioning the d signal positions into O(mfc) random subsets, where 
?7ifc is approximately the number of spikes, and approximately the ratio of the 1-norm of the noise 
to the magnitude of spikes. There are two important phenomena: 

• A measurement may have exactly one spike, which we call isolated. 

• A measurement may get approximately its fair share of the noise — approximately the frac- 
tion 1//J, if \x is the number of measurements. 
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If both occur in a measurement, then it is easy to see that the bit tests will allow us to recover 
the position of the spike and a reasonable estimate of the coefficient (that turns out to be accurate 
enough for our purposes). With high probability, this happens to many measurements. 

Unfortunately, a measurement may get zero spikes, more than one spike, and/or too much noise. 
In that case, the bit tests may return a location that does not correspond to a spike and our 
estimate of the coefficient may have error too large to be useful. In that case, when we subtract the 
"recovered" spike from the signal, we actually introduce additional spikes and internal noise into 
the signal. We bound both of these phenomena. If we introduce a false spike, our algorithm has 
a chance to recover it in future iterations. If we introduce a false position with small magnitude, 
however, our algorithm may not recover it later. Thus the internal noise may accumulate and 
ultimately limit the performance of our algorithm — this is the ultimate source of the logarithmic 
factor in our accuracy guarantee. 

In pass k = 0, the algorithm is working with measurements of the original signal /. This signal 
can be decomposed as / = f m + w, where f m is the best m-term approximation of / (spikes) 
and w is the remainder of the signal, called external noise. If w = 0, the analysis becomes quite 
simple. Indeed, in that case we exactly recover a constant fraction of spikes in each pass; so we will 
exactly recover the signal / in O(logm) passes. In this respect, Chaining is superficially similar to, 



e.g., |GGI + 02b . An important difference is that, in the analysis of Chaining pursuit, we exploit 
the fact that a fraction of spikes is recovered except with probability exponentially small in the 
number of spikes; this lets us unite over all configurations of spike positions and, ultimately, to get 
a uniform failure guarantee. 

The major difficulty of the analysis here concerns controlling the approximation error from 
blowing up in a geometric progression from pass to pass. More precisely, while it is comparatively 
easier to show that, for each signal, the error remains under control, providing a uniform guarantee — 
such as we need — is more challenging. In presence of the external noise w 7^ 0, we can still recover a 
constant fraction of spikes in the first pass, although with error whose t\ norm is proportional to the 
l\ norm of the noise w. This error forms the "internal noise", which will add to the external noise 
in the next round. So, the total noise doubles at every round. After the log a m rounds (needed to 
recover all spikes), the error of recovery will become polynomial in m. This is clearly unacceptable: 
Theorem claims the error to be logarithmic in m. 

This calls for a more delicate analysis of the error. Instead of adding the internal noise as a 
whole to the original noise, we will show that the internal noise spreads out over the subsets of 
the random partitions. So, most of the measurements will contain a small fraction of the internal 
noise, which will yield a small error of recovery in the current round. The major difficulty is to 
prove that this spreading phenomenon is uniform — one isolation matrix spreads the internal noise 
for all signals / at once, with high probability. This is a quite delicate problem. Indeed, in the last 
passes a constant number of spikes remain in the signal, and we have to find them correctly. So, the 
spreading phenomenon must hold for all but a constant number of measurements. Allowing so few 
exceptional measurements would naturally involve a very weak probability of such phenomenon to 
hold. On the other hand, in the last passes the internal noise is very big (having accumulated in 
all previous passes). Yet we need the spreading phenomenon to be uniform in all possible choices 
of the internal noise. It may seem that the weak probability estimates would not be sufficient to 
control a big internal noise in the last passes. 

We will resolve this difficulty by doing "surgery" on the internal noise, decomposing it in pieces 
corresonding to the previous passes, proving corresponding uniform probability estimates for each 
of these pieces, and uniting them in the end. This leads to Condition 1.301 which summarizes the 
needed properties of the isolation matrix. 
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The proof of Theorem is by induction on the pass k. We will normalize the signal so that 
II^II-l = l/(400000a). We will actually prove a result stronger than Theorem |§J The following is 
our central loop invariant: 

Invariant 10. In pass k, the signal has the form 



where s& contains at most m& spikes, w = f — f m is the external noise, and each vector Uj is the 
internal noise from pass j, which consists of 3m j or fewer nonzero components with magnitudes at 
most 2/rrij. 

When we have finished with all passes (that is when k = 1 + log a m) , we will have no more spikes 
in the signal (m^ = thus Sk = 0). This at once implies Theorem |5J 

The proof that Invariant ^] is maintained will only use two properties of an isolation matrix, 
given in Condition 1301 While we only know how to construct such matrices using randomness, any 
matrix satisfying these properties is acceptable. Section l5~21 wiH prove that Invariant HUI holds for any 
matrix $ having the properties in Condition I3UI Section [5.31 proves that most matrices (according 
to the definition implicit in Section \'2. 2(1 satisfy these properties. Note that the conditions are given 
in terms of matrix actions upon certain kinds of signals, but the conditions are properties only of 
matrices. 

Condition 11 (Chaining Recovery Conditions for Isolation Matrices). A 0-1 matrix with pass/trial 
hierarchical structure described in Section \2. "A (i.e., any matrix from the sample space described in 
Section \2. ty) is said to satisfy the Chaining Recovery Conditions if for any signal of the form 
in Invariant MIA and for any pass k, then at least 99/100 of the trial submatrices have these two 
properties: 

(1) All but jgo m fc+i spikes appear alone in a measurement, isolated from the other spikes. 

(2) Except for at most j^nik+i of the measurements, the internal and external noise assigned 
to each measurement has l\ norm at most Y^jo^fc 1 - 

5.2. Deterministic Part. 

In this section, we consider only matrices satisfying Condition 1301 Proposition El considers 
the performance of the algorithm in one of the 99/100 non-exceptional trials under an artificial 
assumption that will be removed in Proposition 1171 Following that, we consider the performance 
of the combination of trials, prove that Invariant 1101 is maintained, and conclude about the overall 
performance of Chaining Pursuit Proper. 

Proposition 12 (One Trial, No Inaccuracies). Suppose that a trial is not exceptional. Assume 
that each measurement contains at most one spike and that the external noise in each measurement 
is no greater than £ = j^Q^k 1 ■ Then the trial constructs a list of at most spikes. 

(1) If\f^ k \i)\ > 2e then the list contains a spike with position i and estimated value f( k \i)±e. 

(2) If the list contains a spike with position i and \f^ k \i)\ < 4e, then the estimated value of the 
spike is no more than 5e in magnitude. 

We call list items that satisfy these estimates accurate. 

Proof. To prove this proposition, we outline a series of lemmas. We begin with a simple observation 
about the performance of the bit-tests. 

Lemma 13. Assume that a measurement contains a position i of value p and that the t\ norm of 
the other positions in the measurement is at most e. Then 

(1) The estimated value p es t is bounded by the total measurement; that is, \p e st\ < \p\ + e - 
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(2) If \p\ > 2e, then the estimated position is i (i.e., the bit-test locates the position correctly) 
and the estimated value is within e from p, \p cs t — p\ < e. 

Proof. Follows immediately from the definitions of the bit-tests and the estimation procedures. □ 

Let us set e = j^jTri^" 1 and observe that m 1 < 100 Q mfc = e. Definition establishes two 
criteria that most measurements satisfy. We refer to these two types as good measurements and 
define them precisely. 

Definition 14. A good measurement satisfies one of the following two criteria: 

(1) The measurement is empty; that is, it contains positions with values \f^ k \i)\ < e and the 
total t\ norm of the positions in the measurement is less than 1.5e. 

(2) The measurement contains one spike at position i with \f^ k \i)\ > e and the l\ norm of all 
other positions in this measurement is less than 0.5e. 

The next lemma states that for good measurements, the bit-tests return reasonably accurate 
estimates. 

Lemma 15. Assume that a measurement is a good one. If the measurement is empty, then the 
estimated value of f^ in that measurement is no more than 1.5e. If the measurement contains one 
spike, then the estimated position is the position of the spike and its estimated value is within 0.5e 
of the true value of the spike. 

Proof. Follows from the definitions of good measurements and Lemma 1131 □ 

The next lemma follows from the previous argument and demonstrates that if the bit-tests 
identify a spike position, they do so reasonably accurately and precisely. 

Lemma 16. Assume that in a single measurement the bit-tests identify and estimate a spike at 
position i and that the estimated value p est is greater than 1.5e, \p es t\ > l-5e. Then the measurement 
contains a spike at position i and the true value of p is within 0.5e of p es t- 

Strictly speaking, we perform multiple trials at each round k. We obtain, in a single trial, an 
estimate position and its estimated value, which we call the preliminary estimated value for that 
position. If, after performing all the trials, we have more than one preliminary estimated value 
for an estimated position, we simply use the preliminary value with the largest absolute value as 
the estimate assigned to this position. We then identify the positions with the largest assigned 
estimates. A simple argument (which we omit here for brevity) demonstrates that the true value 
p of a spike at position i with \p\ > 2e is assigned an estimate p cs t within 0.5e of p. Furthermore, 
if the assigned estimate p es t of a spike at position i satisfies |p e st| > l-5e, then the true value p is 
within 0.5e of p cst . To simplify our arguments in what follows, we simply refer to the estimated 
values in one trial as the assigned estimate values, p k \i). 

With the above lemmas, we are able to complete the proof of the proposition. Our previous 
discussion shows that those positions i with I > e include the positions with estimated 
values larger than 1.5e; i.e., 

{i||/( fe )(i)|>1.5e}c{z||/W W | >e }. 

Our inductive hypothesis assumes that there are at most positions in the right set above, so 
there are at most positions in the left set as well. Our algorithm (for one trial) identifies all 
of these positions and reports estimated values that are within 0.5e of the true values. Hence, our 
list of identified positions includes those i with |,p"(i)| > 2e. If a position i is identified and if 
|/( fc )(i)| < 4e, the its estimated value is at most 5e in magnitude by the previous lemmas. This 
proves the proposition. □ 
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The next proposition removes the artificial assumption on the spikes and noise. 



Proposition 17 (One Trial). Suppose that the trial is not an exceptional trial. In this trial, suppose 
each measurement is a good one, except for at most ^mk+i- Then the trial constructs a list of at 
most mk spikes. All items in the list are accurate, except at most ^rnk+i- 

Proof. We begin the proof with a lemma that shows the list produced by the algorithm is stable 
with respect to changes in a few measurements. 

Lemma 18. Assume that we perform one trial of the algorithm with two different signals and that 
their measurements (in this one trial) are identical except for b measurements. Then the estimated 
signals are identical except in 2b positions. 

Proof sketch. Prove this for 6 = 2 and then proceed by induction. □ 

Let us now consider the set of fewer than ^nik+i bad measurements and set to zero the signal 
positions that fall into these measurements. This procedure creates two signals: the original signal 
and the restricted signal (with zeroed out positions). The restricted signal satisfies the conditions 
in Proposition^] so all its measurements are good ones and agree with those of the original signal 
except for ^m^i measurements. The previous lemma guarantees that the estimated signals for 
the original and restricted signals are identical in all but ^nik+i positions. By our inductive 
hypothesis, there are at most ^m^+i positions of the original signal with value greater than 2e 
in the exceptional measurements. Let us gather these ^mk+i and ^mk+i exceptional positions 
into one set of ^mk+i exceptions. It is straightforward to show that the positions not in this 
exceptional set are good positions and, if they are identified, they are identified accurately. □ 

We combine results from all trials. The algorithm considers positions identified in at least of 
the total trials T. It then takes the median (over all trials) to estimate the values of these positions. 

Lemma 19 (Combining Trials). The number of list items that are inaccurate in more than 1/10 
of the trials is at most nik+i- The total number of positions that appear in 9/10 of the trials is at 
most Tpmfc. 

Proof. We prove the first part of the lemma with a simple counting argument. Let T denote the 
total number of trials. We have to bound b where 

b = ^{positions bad in > ^ trials} < ^{positions bad in > j\ good trials}. 

Let 

E = jf- {positions bad in trial j}. 

j£ good trials 

We have E > Let T' be the number of good trials. Then Proposition El tells us that E < 
^m fc+1 T' < ^-m k+1 . Therefore, ^ < %m k+1 and, hence, b < §§m fc+ i. 

To prove the second part, recall that the algorithm updates a position if and only if the position 
is identified in at least 9/10 of the trials. Let I denote the number of such positions. In every trial, 
uik positions are identified. Hence, 

T 

_t g 
TOfcT = ^{positions identified in trial t) > —zT£ 

t=i 

and thus i < -§-m,k- □ 

Now we are ready to prove the induction step. Recall that after round k, the new signal is the 
difference between the current signal and its estimate: f( k ' = — f^ k ~ l \ with the convention 
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that if a signal position is not considered and not changed by the algorithm, its estimated value is 
zero. 



Lemma 20 (Induction Hypothesis). After pass k, there are at most rrik+i spikes remaining. The 
contribution v k to the internal noise contains at most 3m k components with values at most 2jm k . 

Proof. Recall that 4.5e < m^ 1 . It suffices to prove that for the non-exceptional positions i that 
satisfy the conclusions of Lemma flUl the value < 4.5e < Let us fix such a position i 

which is good in at least 9/10T trials and show that 



< 4.5e. 



If > 2e, then the goodness of i in 9/10 trials implies that i is identified in these trials 

and hence i is considered by the Algorithm. In each of these 9/10 trials, the goodness of i also 
means that the assigned estimated value of i is within 0.5e from its true value f^ k ~ l \i). Since 
f( k ~ l \i) is the median of the assigned estimated values of i in each trial, it follows that it is also 
within 0.5e from the true value 

Suppose that l/^ -1 ^)) < 2e. If i is not considered by the algorithm, then the value of the signal 
at this position is not changed, so \ f^ k \i)\ = l/^" 1 ^)! — 2e. We can assume that i is considered 
by the algorithm. There are (9/10)T trials in which i is identified and there are (9/10)T trials 
in which i is good. Hence, there are at least 8/10T trials in which i is both good and identified. 
By the definition of goodness, this means that in each of these trials, the assigned estimated value 
l/^-^OOl is at most 2.5e. Since the estimated value is the median of the assigned estimated values 
of i in each trial, it follows that \f {k ~ l \i)\ < 2.5e. Then 

_/(*-!) < \f( k - 1 \i)\ + \p- 1 \i)\<2e + 2.5e = 4.5e. 

This completes the proof of the first part of the lemma. 

The first part of this lemma shows formally that the difference between the spikes in the signal 
and the large entries in the update signal (i.e., those with absolute values greater than rn^ 1 ) 
contains at most m k+ i terms. By the inductive step, the same holds for the previous rounds. In 
addition, Lemma IT9l tells us that the algorithm updates at most ^§-m k positions in the signal. By 
the triangle inequality it follows that the difference contains at most m k+ i + m k + t^tti^ < 3nik 
terms. The maximal absolute value of the difference signal is 



1 1 2 
+ < . 



□ 



The previous lemma proves the induction hypothesis. The next and final lemma of this section 
controls the recovery error and completes the proof of Theorem 1221 

Lemma 21 (Total Spikes and Recovery Error). Chaining Pursuit Proper recovers at most 0{m) 
spikes. The total recovery error is at most (1 + Clogm) [| II x - 

Proof Sketch. After pass K = log a m, there are no more spikes remaining since = m/a k < 1. 
At most ^mfc spikes are recovered in pass k. Since m k decays geometrically, the total number 
of spikes is 0(m). The error after the last pass is the t\ norm of the signal f( K+l \ This signal 
consists of the external noise, which has norm H^l^, and the internal noise, which satisfies 



22^ 



< 6mjmj = 6 log a m. 

l J=1 



Since a is a constant and {{w^ was normalized to be constant, the overall error is at most (1 + 
Clogm) || w || ! for some constant C. □ 
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5.3. Probabilistic Part. Here we prove that a random isolation matrix A indeed satisfies the 
CRC with high probability. 

Theorem 22. With probability at least (1 — 0(d~ 3 )), a matrix A drawn from the distribution 
described in Section \2. 6 A satisfies the Chaining Recovery Conditions ( Conditions \3U\) . 

The main lemmas of this section are as follows. First, Lemma|23]is an abstract technical Lemma 
about putting balls into buckets and the number of isolated balls that likely result. LemmaElis a 
corollary for our context. We then show, in Lemma 121)1 that Condition [301 holds for most matrices. 

Lemma 23 (Balls and Bins). Put n balls randomly and independently into N > C(M)n buckets. 
Then, with probability 1 — 2e _9n , all except n/M balls are isolated in their buckets. 

Proof. One complication in the proof comes from the absence of independence among buckets. We 
would rather let buckets choose balls. However, the contents of different buckets is dependent 
because the total number of balls is limited. So we will replace the original n-ball model with an 
independent model. The independent model will be easier to handle by the standard large deviation 
technique; the independent model reduces the original model from it by conditioning on the number 
of balls. 

The independent model is the following assignment. We divide each bucket into n sub-buckets, 
and let 5 k i be independent 0, 1 valued random variables with expectation E 5 k .- L = 1/2V, for all buckets 
k = 1, . . . , N and sub-buckets i = 1, . . . , n. The independent random variables X k = Ya=i ^ki wm 
be called the number of balls in bucket k in the independent model. If we condition on the total 
number of such "balls" , we obtain the distribution of the numbers of true balls X' k in bucket k in 
the original model: 

N 

(x' 1 ,...,x' N ) = (x 1 ,...,x N I Y,x k = n y 

k=l 

To prove the Lemma, we have to show that the number of non-isolated balls is small. The 

k = X 'k ■ Hx' k >i} 



number of non-isolated balls in bucket k is Yl = X', ■ l/y^u so the conclusion of the Lemma is 



that 

N 

F { Y Y k > n / M } ^ 2e ~ 9n ( 5 - 2 ) 
k=l 

We will now transfer this problem to the independent model. First, without loss of generality 
we can change the n balls in the lemma and in the original model to 0.9n balls. We do not 
change the independent model, so the number of non-isolated balls in the independent model is 
Yfc = Xf, ■ l{x fc >i}- We have to bound 

N N N 

Y,Yl > n/M) = P{ ]Ty fc > n/Nl\ £ 

k=l k=l k=l 

N N 



£ Yl > n/M} = P { Y k > n/M | X k = 0.9n} 

k=l k=l 
N N 

< P { ^2 Yfc > n/M I ^2 X k > 0.9n} 

k=l k=l 

" w{Zk=i X k >0.9n}' 

By Prokhorov-Bennett inequality, 

N N n 

^2x k > 0.9n} = P { Yl ^ ^ °- 9n } ^ V2- 

k=X k=l i=l 
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Therefore, proving Equation (|5.2|) in the original model reduces to proving that 



N 



'{Y, Y k> n / M } 
k=l 



< e 



-!)/( 



(5.3) 



in the independent model. 

A standard way to prove deviation inequalities such as Equation (|5.3|) is through the moment 
generating function. By Markov's inequality and independence of the variables we have 



N 



{J2 ¥ k> n / M } = F {e WM ^ Yk > e 10 "} < 

E 



-10n. 



E 



3 ioMEf =1 n 



k=l 



-lOn 



= 10A/Yi 



N 



To complete the proof of Equation (|5.3|) it remains to show that for Y = (X^=i Si) ' <5»>1}' 
its moment generating function satisfies 



E 



WMY 



N 



< e' 



(5.4) 



where 5i are 0, 1 valued independent random variables with K5i = 1/N. To estimate the moment 
generating function E[e M ^] in Equation (|5,4j) . it suffices to know the tail probability F{Y > t} for 
large t. For large t, we estimate this tail probability by removing the restriction onto non- isolated 
bucket in the definition of Y and applying Chernoff 's inequality for independent random variables. 
The tail probability is, however, much smaller if we do restrict onto non-isolated buckets. We take 
this into account for small t by computing the expectation of Y (which is straightforward) . 
Let us start with the first moment of Y. We claim that 

E[Y]<C[^) 2 . (5.5) 

Compare this with the average number of balls without conditioning on being non-isolated, E[Xfc] = 
jr. Indeed, by the linearity of expectation, 

E[F]=n-E[^-l {Er=i5i>1} 

= n ■ 



b\ = 1 and there exists i € {2, . . . , n} : 5% = l| 
{5i = l} • (1 - P {Vi G {2, . . . , n} : 5 t = o}) 



n 
N 



1 \n-l 



< 



n 
N 



Next, by the Chernoff inequality, for s > 2, we have 



El V 
St > s} < (s— 
n 



(5.6) 



i=l 



Now we are ready to bound the moment generating function. Let K = 10M and change variables 
t = e Ks , so that 



E 



~e KY 






-L 







-KY 



l + K 



> t j dt = 1 + J P \e KY > t} dt 

} 



Y > s\ e 
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We split the integral in two parts. We use Equation (|5.5|) to estimate the integral near zero as 

J F |y > s j e Ks ds < e 3K J P {y > s} 



3A' 



= e iK E[Y] < C7e 
and we use Equation H5.6|) to estimate the integral near infinity as 



Y > s 



PC 



N 



n 



-,Ks 



ds < 



n\ 2 
NJ 

N 



-K 



n 



ds 



-K) 



N 



n 



-K 



< — 



Combining these, we conclude that 

E[e KY 

Hence we obtain Equation (|5.4|) 

(e 

This proves the lemma. 



< 1 + CKe 



3K 



< 1 + 



n 



ION 



JCY 



N 



< 1 + 



11 



N 



ION. 



□ 



Lemma 24 (Isolations). Fix a round k. With probability at least 1 — exp{— 4mfc log d}, the following 
is true. In pass k, at least 99/100 of the trial submatrices isolate all but of the spikes. 

Proof sketch. In the hypothesis of Invariant I1U1 the signal has at most n := positions of value 
larger than l/uik-i- We put these in N := = C'(a)m/2 k buckets. We can choose the function 
C'(a) so that, for C of Lemma I2T1 we have C'(a) = C(j^-). Apply Lemma 031 which states that 
all except yggm-fc+i positions are isolated with probability 1 — 5, where 5 = 2e~ 9mfc . 

Let I be the event that all but j^rnk of the spikes are isolated. Let us repeat the random 
assignment above independently T times (for T trials) and let St be independent Bernoulli random 
variables, E<5t = 5. We see by Chernoff's inequality that 



P-f/ fails in more than T trials) < 

1 100 J ~ 



T 

E 

t=i 



5 t > 



exp ( 



—t\ < (100e5)if)o T 
00 / ~ v ' 

T) < exp(— 4m,fc log d). 



100 
1 
100 



□ 



This concludes the proof of Lemma I2H 

We now proceed to prove that Invariant is maintained by most isolation matrices A. That 
is, we need to show that, for most A, when the Chaining Pursuit algorithm uses A on a signal 
satisfying Invariant 1 101 for round k, the algorithm produces a signal satisfying Invariant 1 1 01 for round 
k + 1. So, in the remainder of this section, we may fix a signal satisfying Invariant 1101 for round k. 
Lemma 1221 controls the external noise and Lemma 12^1 controls the internal noise. 



Lemma 25 (External noise). In pass k, in every trial, the number of measurements where the l\ 
norm of the external noise exceeds 



2~Uo6 m k 1 ^ s °^ mos t 



200 



m k+ i . 



Proof sketch. This is an easy part of the argument. The (1, 1) operator norm of each matrix A^ 
equals one, so it does not inflate the norm of the external noise. We use Markov's inequality to 
bound the number of measurements with too much noise. □ 

19 



Lemma 26 (Internal noise). Fix a round k. With probability at least 1 — exp{— 4m& log d}, the 
following is true. In pass k, in at least 99/100 of the trials, the number of measurements where the 
l\ norm of the internal noise exceeds ^o 771 ^ 1 ^ s a ^ m ost ^m^i. 

Proof. Let us recall the Invariant I1UI This lemma is a statement is about the signal in the 

positions with values smaller than 1/mk-i- Lemma Pol gives us a proof for k = 0. 

Let k > 1. We may assume that the external noise w is and we may absorb the spikes in 
Equation 15. II into the first term; i.e. we can assume that our signal has the form 

k— 1 
3=0 

where Vj consists of Arrij or fewer nonzero components with magnitudes at most — . 

To prove this result, we introduce positive parameters Xj, €j, j = 0, . . . , k — 1, which satisfy 

^•<- (5.8) 



3=0 C 



and 

k-l 



C'a 

3=0 

where C is a positive absolute constant to be chosen later. Next, we will prove the following 
separate claim about the internal noise Uj. 

Claim 27. Assume that a signal satisfies Equation (|5.7|) . Let j € {0, . . . , k — 1}. Then 

# (^measurements in trial t s.t. > ~~^~) < £j™kT (5.10) 

t=i mk 

with probability 

1 _ e -7"yT (5.11) 

where 7 is some positive number such that 

7T>101ogd. (5.12) 

Proof. Claim implies Lemma l26l We will first show that this Claim implies Lemma l2lfl Assume 
the claim holds. By the definition of T, the exceptional probability is 

e - 7 mjT < e -10m 3 logd < ( d 

~ \4mj 

while the number of choices of round j signal is ). Hence, with probability 1 — ( 4 d ) 3 , the 

inequality in Equation 1)5.10(1 holds uniformly for all choices of internal noise 17 . Summing up these 
exceptional probabilities for all rounds j = 0, . . . , k — 1, we conclude that: 

with probability 1 — ( 4 fL) , the system of measurements is such that the inequality 
in Equation (|5.1U|) holds uniformly for all choices of the signal satisfying Equa- 
tion (f5"7|) . 

Fix a system of measurements which meets these requirements so that we can dismiss the probability 
issues. Let us also fix a measurement v and a trial t. We refer to v, as the signal in measurement v 
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and trial t. By Equation (|5.8[) . for a fixed trial and a fixed measurement v, we have the containment 
of events: 



{H,lli>^^} g U{lH>>^;} 



J=0 

Counting the measurements that satisfy each side of this containment, then summing over the 
trials, we obtain: 

t=i t=l j=0 

k-1 

< tj^kT by firrnt 

3=0 

<^rm k T by JSD 
C'a 

= -^mk+iT. 

By Markov's inequality, this implies (provided C is chosen large enough) that in at most j^qT 
trials t, the number of measurements where the t\ norm of the internal noise exceeds 2 ooom fc * s 
greater than ^m^i, This implies the conclusion of Lemma l2lfl 

Proof of the claim. Now we prove the claim itself. This is a purely probabilistic problem. We 
call the nonzero positions of Uj "balls" and we informally refer to the measurements as "buckets". 

By the definition of i/j, the 1-norm of Uj in a measurement (or bucket) v is large, H^uHi > ^ if 

and only if the measurement contains at least jjAj^- balls. Then Equation 1|5,10[) is equivalent to 

T 

y # (measurement in trial t which contain > — A 7 - — - balls) < tjm k T. (5.13) 
^ 3 m k 

To prove this with required probability 1)5.11(1 . we will transfer the problem to an independent 
model — similar to the proof of Lemma 12*31 

Recall that the original model with n balls and T trials, in which we want to prove Equa- 
tion (|5.13|) . is to put n = Arrij balls into N = buckets (or measurements) independently, and 
repeat this T times (trials) independently. 

We want to replace this by the following independent model, where the contents of buckets are 
independent. There are N buckets in each of T trials. Divide each bucket into S = nT sub-buckets. 
Let Stu be independent 0, 1 valued random variables with expectation ~K5tu = 1/NT, for all trials 
t = 1, . . . , T, buckets I = 1, . . . , N and sub-buckets i = 1, . . . , S. The independent random variables 

S 

X a = J2 6 tli ( 5 - 14 ) 

will be called the number of balls in bucket I, trial t, in the independent model. Note that 

S n 



EX, 



NT N 

Thus the average total number of balls in buckets in one trial is n. Let E tot be the event that 
in each of at least T/2 trials, the total number of balls in buckets is at least n/2. It is then 
easy to deduce by Chernoff and Prokhorov-Bennett's inequalities that with probability at least 1/2 
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(actually 1 — e nT ), E tot holds; that is, P{E to t} > 1/2. We have to bound above the probability of 
the event 

E := { Equation (|5.13|) does not hold} 
in the original model. We reduce it to the independent model as follows: 

P{E in independent model} > P{E in independent model | -Etot} ■ P{-E>tot} 

> — P{E in independent model | ^tot} 

The probability can only decrease when we consecutively do the following changes: 

(1) remove both occurences of "at least" in -Etot, resulting in exactly T/2 trials and exactly n/2 
balls; 

(2) restrict the sum in Equation IJ5.13J) to the T/2 trials included in the new (exact) version of 
E t ot; and 

(3) fix the set of T/2 trials in Etot — say, require that these be the first T/2 trials. 
After doing this, the law becomes the original model with n/2 balls and T trials. Hence, 

P{E in independent model} > — P{E in original model with n/2 balls and T/2 trials}. 

Therefore, it suffices to prove that Equation ()5.13|> holds in the independent model, with probability 
as in (|5.11|) . i.e. with probability 1 — ^e -7 "^, where 7 is as in Equation (|5.12|) . 

In order to prove Equation (|5.13|) with the requisite probability, we first estimate the number of 
balls Xti in one bucket, see Equation (j5.14| . It is a sum of S = nT independent Bernoulli random 
variables with expectations jjjt. Then by the Chernoff inequality, 



1 m 7 

Ij^S)-*-.. (5.15) 
I 3 mu) \12e mi/ 



m^J \ rze J m^, 

Let us call this probability r/. We have to estimate the sum in Equation (|5.13|) which equals 
ELi EfcLi Sa where 

are independent Bernoulli random variables whose expectations are MSti < V by Equation (|5. 15|) . 
Then by the Chernoff inequality, the probability that Equation ()5.13j) fails to hold is 

T N 

P { EE 5 " > e ^ T } ^ (P/e)-^ mkT , 
t=i 1=1 

where 

ejrn, = 1 = m, (_L x ™±y X ^ (5 . 16) 

rjN ?nfc 77 V12e m^J 

To complete the proof, we need to show that 



which would follow from 

{(3/e) 3 ~ > e 2 \ (5.17) 

where 7 must satisfy Equation (|5.12|) . 
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Now we specify our choice of Xj and €j. Set 

I Vmfc/ rrij C'k) 

£j := max — , — — - \. 

Lmfc Cak) 

Then Equations (|5.8|) and (|5.9() are clearly satisfied (recall that j < k). Next, the base in Equa- 
tion (|5.16|) is estimated as 

12e m/c ^TOfc/ 



and the exponent can be estimated using 



Xj— > 24. 



Also, the first factor of Equation (|5.16|) is estimated as 



Combining these three estimates, we obtain 

l 771 i l 

~~ \rn k J Vrrifc/ ~~ ^Wfe/ 

Now we can check Equation (|5.17|) . 



We) «^ > (JS.) 

V emi. / 



em k . 

> (a/2) s by the definition of m^, 

/ln(a/2) k \ 
— ex P ( (m\2 — ' T2 ) ^ ne definition of Xj and ej 

_ e c(a)/fc_ 

Therefore Equation (|5.17|) holds for 7 = c(a)/k, and this choice of 7 satisfies the required condition 
in Equation (l5~T2l . since T = C(a)(/c + 1) log(d). □ 

This completes the proof of Lemma OfH □ 

5.4. Pruning. To this point, we have shown that the Chaining Pursuit Proper algorithm produces 
an approximation / of at most 0(m) terms with 

||/ - /1L < (1 + Clogr«)||/ - / m || x . 

We now show that pruning produces f m with 

llZ-Zmlli <3{l + Clogm)\\f-f m \\ r 

That is, we reduce the number of terms to exactly m while increasing the error by a small constant 
factor. This result applies to any approximation, not just an approximation produced by Chaining 
Pursuit Proper. This is our top-level result. 

Theorem 28. Let f be an approximation to f with || / — /]| < B\\f — fm\\i- The 



len 



\f-fm\l < (25 + l)||/-/ ro || r 
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Proof. We have, using the triangle inequality and optimality of f m for / 



f-fmW, < Wf-fW. + Wf-f, 




< 11/ "/111 + 11/ "/mill 
^ 11/ " /111 + II/- /111 + II/-/. 

< (25 + l)||/-/ m || r 



□ 



5.5. Robustness. In this subsection, we prove Corollary0J As advertised in the introduction, the 
Chaining Pursuit algorithm is not only stable with respect to noise in the signal but also robust 
to inaccuracy or errors in the measurements. Suppose that instead of using the sketch <&/ of the 
signal /, e receive V = $>f + y and we reconstruct / from V. We assume that once we carry out the 
Chaining Pursuit algorithm, there are no perturbations to the intermediate measurements, only to 
the original sketch <&/. 

Corollary 29. With probability at least (1 — 0(d -3 )), the random measurement operater 3> has 
the following property. Suppose that f is a d- dimensional signal whose best m-term approximation 
with respect to the t\ norm is f m . Given the measurement operator 3>, for every V (not necessarily 
the sketch <&/ of f), if f is the reconstruction from V, then 



Proof. We need only make a few adjustments to the proof of the main theorem to obtain this result. 
For brevity, we note these changes. Let V = <&/ + y and let us refer to y as the measurement error. 



Condition 30 (Chaining Recovery Conditions for Robust Isolation Matrices). A 0-1 matrix with 
pass/trial hierarchical structure described in Section \2.2\ (I.e., any matrix from the sample space 
described in Section \2.2)) is said to satisfy the Chaining Recovery Conditions if for any signal of 
the form in Invariant \1U\ and for any pass k, then at least 99/100 of the trial submatrices have 
these two properties: 

(1) All but jLjmfc + i spikes appear alone in a measurement, isolated from the other spikes. 

(2) Except for at most j^mf.+i of the measurements, the internal and external noise assigned 
to each measurement has t\ norm at most Y^g^fc 1 - 

(3) Except for at most j^mk+i of the measurements, the measurement error assigned to each 
measurement has l\ norm at most ^m^ 1 . 

To prove that a random isolation matrix satisfies this additional property with high probability, 
we use Markov's inequality to bound the number of measurements that are large. This is the same 
argument as in the second half of the proof of Lemma 1251 

Next, we adjust LemmalTBIto include in the bound e not just the l\ norm of the other positions but 
also the measurement error. We also modify the definition of a good measurement in Definition 1311 
to include the measurement error. 

Definition 31. A good measurement satisfies one of the following two criteria: 

(1) The measurement is empty; that is, it contains positions with values \f^ k \i)\ < e and the 
total l\ norm of the positions in the measurement plus the measurement error is less than 



11/ - / m ||i < C{1 + log(m))(||/ - / m ||i + ||*/ - F||i) . 




1.5e. 
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(2) The measurement contains one spike at position i with \ f^ k \i)\ > e and the i\ norm of all 
other positions in this measurement plus the measurement error is less than 0.5e. 

We conclude by noting that with the above changes, we change Lemma|^to include the i\ norm 
of the measurement error ||j/||i, as well as the noise ||iu||i. That is, after the last pass, the error 
with 

K K 

m < \\w\\\ + I v\ < \\w ||i + Gmjirij 1 = || ||i + 6 log a m. 

j=0 1 j=l 

Since a is a constant and \\w ||i and \\y\\i were normalized to be constant, we have that the overall 
error is at most 

(l + Clog(m))(||/-/ m ||i + ||*-F|| 1 ). 

□ 

6. Algorithmic Dimension Reduction 

The following dimension reduction theorem holds for sparse vectors. 

Theorem 32. Let X be the union of all m-sparse signals in M. d and endow M. d with the i\ norm. 
The linear map 3? : M. d — ► W 1 in Theorem satisfies 

A\\f-g\\i < ||*(/)-*G7)||i<J3||/-<7||i 

for all f and g in X , where 1/A = Clog(m) and B = Clog 2 (m) log 2 (d) and n = 0(mlog 2 d). 

Proof. The upper bound is equivalent to saying that the l\ — > l\ operator norm satisfies ||^||i_>i < 
B. This norm is attained at an extreme point of the unit ball of if, which is thus at a point with 
support 1. Then the upper bound follows at once from the definition of 3>. That is, any 0-1 vector 
of support 1 gets mapped by $ to a 0-1 vector of support bounded by the total number of bit-tests 
in all trials and passes, which is ^l—o" 1 °0 1°&2 d < B. 

The lower bound follows from Theorem[21 Let / and g be any d-dimensional signals with support 
m, so that f = f m and g = g m . Let V = $>g. Then the reconstruction / from V will be exact: 
f = g. As proven in Corollary 01 

11/ -Sill = 11/ "/111 

< cio g (m)(||/-/ m || 1 + ||*/-y|| 1 ) 

= Clog(m) ||*/ - ^j^, 
which completes the proof. □ 

We are interested not only in the distortion and dimension reduction properties of our embed- 
ding but also in the stability and robustness properties of the embedding. Our previous analysis 
guarantees that is the identity on X and that the inverse can be computed in sublinear 

time since Chaining Pursuit Proper perfectly recovers m-sparse signals. Our previous analysis also 
shows that our dimension reduction is stable and robust. In other words, our embedding and 
the reconstruction algorithm can tolerate errors n in the data x € X, as well as errors v in the 
measurements: 

Theorem 33. The linear map $ : M. d — > lR n in Theorem 2 and the reconstruction map \l/ : W 1 — > R d 
given by the Chaining Pursuit Proper algorithm satisfy the following for every r\ 6 M rf and every 
v 6 W 1 and for all m-sparse signals x in M. d : 

||x-*(*(x + 77) + i/)||i < (1 + Clogm)(||7/||i + |M|i). 
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Proof. This is just a reformulation of our observations in Corollary with x = f m , rj = f — f m , 
v = $f-V. □ 



7. Conclusions 

We have presented the first algorithm for recovery of a noisy sparse vector from a nearly optimal 
number of non-adaptive linear measurements that satisfies the following two desired properties: 

• A single uniform measurement matrix works simultaneously for all signals. 

• The recovery time is, up to log factors, proportional to the size of the output, not the length 
of the vector. 

The output of our algorithm has error with ^i-norm bounded in terms of the £i-norm of the 
optimal output. Elsewhere in the literature, e.g., in |CT04| IRV06| ICDDf)6j . the ^2-norm of the 
output error is bounded in terms of the ^i-norm of the optimal error, a mixed-norm guarantee that 
is somewhat stronger than the result we give here. A companion paper, in progress, addresses this 
as well as the logarithmic factor in the approximation error that we give here. 

If the measurement matrix is a random Gaussian matrix, as in |CT04l IRV061 ICDD06| . the 
measurement matrix distribution is invariant under unitary transformations. It follows that such 
algorithms support recovery of signals that are sparse in a basis unknown at measurement time. 
That is, one can measure a signal /* as V = <&/*. Later, one can decide that /* can be written as 
/* = Sf, where S is an arbitrary unitary matrix independent of 3> and / is a noisy sparse vector 
of the form discussed above. Thus V = ($S)f, where is Gaussian, of the type required by the 
recovery algorithm. Thus, given V, <1>, and S, the algorithms of |CT041 IRV061 IC~DD06| can recover 
/• 

If the matrix S is known at measurement time, our algorithm can substitute for at 
measurement time and proceed without further changes. If S is unknown at measurement time, 
however, our algorithm breaks down. But note that an important point of our algorithm is to 
provide decoding in time mpolylog(d), which is clearly not possible if the decoding process must 
first read an arbitrary unitary d-by-d matrix S. Once a proper problem has been formulated, it 
remains interesting and open whether sublinear-time decoding is compatible with basis of sparsity 
unknown at measurement time. 
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